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Abstract. We investigate the properties of correlation based networks originating from economic complex 
systems, such as the network of stocks traded at the New York Stock Exchange (NYSE). The weaker 
links (low correlation) of the system are found to contribute to the overall connectivity of the network 
significantly more than the strong links (high correlation). We find that nodes connected through strong 
links form well defined communities. These communities are clustered together in more complex ways 
compared to the widely used classification according to the economic activity. We find that some companies, 
such as General Electric (GE), Coca Cola (KG), and others, can be involved in different communities. The 
communities are found to be quite stable over time. Similar results were obtained by investigating markets 
completely different in size and properties, such as the Athens Stock Exchange (ASE). The present method 
may be also useful for other networks generated through correlations. 

PACS. 89.65.-S Social and economic systems - 89.75.-k Complex systems - 89.90.+n Other topics in 
areas of applied and interdisciplinary physics (restricted to new topics in section 89) 

1 Introduction A network representation is found useful to characterise 

the system, by associating each element by a node and 
Recently there has been a growing interest to better un- each interaction by a link (weighted or not). To under- 
derstand complex systems. A complex system is generally stand the network structure and function, various tools 
composed of many interacting elements in various ways, from statistical physics have been developed. These tools, 
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|l|2|3|4|5|6|7|8|9|l0j . enable us to extract useful informa- 2 Methods: Creating and Destructing the 

tion and to better describe properties of complex sys- IXIgtWOrk 
terns. Examples of complex systems that have been re- 
cently investigated from this perspective include the In- In many complex systems the network is built by us- 
ternet [llll2j . the World Wide Web [13], communication ing correlations between the dynamics of the nodes. This 
networks [H], food webs [H], sexual contact networks [IS] is, for example, the case in economic networks, where a 
and economic networks |17|18|21j . weighted link is assigned between two nodes representing 

different stocks according to the cross correlation between 
The problem of extracting useful information from a ^^^^^^ ^^^g gg^i^g of g^^j^ g^og]^_ 

system becomes more difficult in the case of correlation tj-i ^^j 4. wujj- 

In the present study we create a correlation based net- 

based networks, since these networks are usually com- ^^^^ ^^-^^ g^^^.^g ^^^-^^^ ^ portfolio comprising 
plete graphs (all hnks between elements are present). On ^^^^ ^^^^^^ ^^^^^^ York Stock Exchange 

the other hand, understanding the behavior of networks ^^ySE) in the period 1987 to 1998. From the daily clos- 

originating from empirical correlation matrices is a very . ■ i- • ^ w u j 

^ mg price time series we can create a correlation based 

important task in many scientific fields, since correlation ^^^^^^^ following the procedure that we describe bel- 
matrices appear in the study of multivariate time series, j^^^ ^.^^^ calculate the correlation coefiicient between 
In order to make correlation based networks simpler to , • r „„ j „■ Hpfiripd as 

understand and extract information from them, the use of 

.—.—1 Pi = (^^'^j) - (^'') (^j) ^^ 

filtering techniques was suggested |17|18j . Filtering tech- /7 ' 2 \ f / 2\" 

niques, like the Minimum Spanning Tree (MST) |17j . re- 

where (...) is the time average over the investigated time 

duce the number of links of the network and keep all the 

period. Here ~ fi{i) is the logarithmic return, defined 

nodes connected with a total maximum weight. 

by ri{t) = \nP,{t)-\iiP,{t-At), and P^{t) is the daily clos- 
In this work we study correlation based networks by ing price of stock i at day t. If two stocks, i and j, are com- 
exploring the evolution and temporal dynamics of the pletely correlated (anti-correlated) then pij = +1(-1), 
structures occurring after the removal of a certain frac- while if the two stocks are completely uncorrelated then 
tion q of links, without forcing all the nodes to remain Pij = Qj- In our case At ^ 1 day. By calculating the 
connected to the network. We identify a particular value correlation coefficient for all pair of stocks, we obtain the 
of this fraction, q « 0.995, close to which structural prop- correlation coefficient matrix of the system. Such matri- 
erties of the network become clearer. We therefore study ces were studied in [19120] and are known to have a large 



the communities of stocks at this particular point. ^ This is true for hnearly correlated time series. 
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Fig. 1. Visualization of the different network structures that occur after removing 99.5% of the links of the initial fully connected 
network, (a) Pictorial representation of a cross correlation matrix of a portfolio of 1062 stocks traded at the NYSE, (b) The 
network after removing the stronger links of the initial fully connected network, (c) The network after removing the weaker links 
of the initial fully connected network, (d) A community of 16 stocks, belonging to 3 overlapping cliques of 14 elements. The tick 
names of the stocks forming this community are: AIG(tan), AXP(tan), BMY(red), C(tan), CL(red), DlS(gray), EMR(green), 
FNM(tan), GE(green), JPM(tan), KO(red), MER(tan), MMC(tan), SGP(red), TA(tan) and TY(tan). (e) A community of 19 
stocks, belonging to 3 overlapping cliques of 17 elements. The tick names of the stocks forming this community are: AEP(blue), 
AIT(blue), BEL(blue), BGE(blue), BLS(blue), CPL(blue), CSR(blue), D(blue), DUK(blue), ED(blue), FPL(blue), GTE(blue), 
KO(red), NSP(blue), PCG(blue), PEG(blue), SBC(blue), SO(blue) and USW(blue). The color codes that we use are according 
to the Standard Industrial Classification (SIC) system for classifying industries. 

amount of noise, that can be attributed to false correlation where small values of the distance dij imply strong corre- 
estimates due to the finite size length of the time series. lation for the pair of stocks, i and j, and vice versa. 



An empirical correlation matrix can be viewed as a 
fully connected weighted network by transforming the cor- 
relation coefficient to a distance, using an appropriate 
function as a metric [17j . The function that we used for 
this transformation is [lT| 



dij ^^2 (1 - py), < dij 



< 2, 



Next we investigate two methods of removing the links 
from this fully connected network, both resulting in sparser 
graphs with totally different properties. From these differ- 
ences we can learn about the structure of the network. 
We begin by sorting the weights in increasing order. In 
the first method we repeatedly remove links from lower 
to higher values of dij (high to low correlations). In the 
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second method we repeatedly remove links starting from connected network and in Fig. [T]Jc) we draw the network 

the highest values to lowest values of dij (low to high cor- left after removing the same fraction of the weaker links. 

relations). A similar approach to our second method was From this figure it is apparent that the two networks, 

implemented by Onnela et al. [21j. In this work the authors even by visual investigation, are totally different. When 

used two sets, one of 116 and one of 477 stocks traded in we remove the stronger links we get a network that has 

the NYSE, and they begun adding links to the initially more nodes but less structure, while when we remove the 

completely disconnected network starting from the high- weaker links the resulting network has fewer nodes, but 

est to lowest correlation values. these nodes are clustered together mainly in accordance 

r. J J- ■ 4-1, J- 1 i J to the sector of economic activity. In order to classify the 

We find that after removmg the stronger correlated 

, n , 1 , 1 /T 1 --T, 1 J 1 \ 4-1, 4- stocks into different sectors we used the Standard Indus- 
links oi the network (links with low dij values], the net- 

, . i J 4_-i 1 J 1 i trial Clasification (SIC) system for the classification of 

work remains connected until we have removed almost ^ ' 

mdustriGS 122] 

99% of its original connections. On the contrary when we ' — 
remove links starting from the weakest correlated (links 

with high dij values) the network starts to lose its nodes 

, ,. r, ■ 11 r ■, ■ ■ To further investigate the way these clusters are inter- 

much earlier, after removing only about 30% oi its origi- 

, 1,1-1/ XT m \ mi • 1 ,1 connected we apply the fc-clique method to detect com- 

nal weakest links (see rig. |4^). ihis result suggests that 

,. , , 11-1 1 i-rr 1 • ;i munities l23i after removing q=0.995 of the weak links, 

strong imks and weak links play very dinerent roies m the 

, TTTi -1 T 1 1 • 1 ■ The reason for choosing this value will be clear later when 

topology. While strong links are usualy situated m pos- 

, . , . , , . . . , . , we find that close to this value structural properties of the 

sition which increases local connectivity withm the com- 

, 11-1 -1 1111 communities are clearer to see. A maximal complete sub- 

munities, the weak links contribute more to the global 

.... . , ... graph of a network is called clique. In addition, a smaller 

connectivity, i.e. m connections between communities. An 

, , ,. . complete subgraph with k nodes, that in general can be 

interesting observation is that m both cases the dismtegra- 

p , , , , , , .11 included in a larger one, is called fc-clique. In a network, a 

tion of the network takes place gradually, mainly because / 

some very small clusters become disconnected from the large complete subgraph of size s, {k < s) contains 

° ■ different smaller complete subgraphs of size k (fc-cliques) 



In Figure [T] we plot some representative results of the The algorithm we are using is able to calculate all the 

above procedure. Fig. [TJa) shows a representation of a fc-cliques of the network, if the network is sparse enough, 

cross correlation matrix of a portfolio of 1062 stocks traded and therefore it allows us to identify a wealth of commu- 

at the NYSE. In Fig. [Hb) we draw the network left after nities of stocks. Examples of such communities are shown 

removing 99.5% of the stronger links of the initial fully in Figure [IJd), Figure [Ije) and Figured 
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Fig. 2. Further examples of communities of stocks that we 
identified using the fc-clique method to the network that re- 
mains after removing 99.5% of the links of the weaker original 
fully connected network. 




Fig. 3. Minimum Spanning Tree (MST) of the connected 
nodes of the network after removing 99.5% of the weaker links 
of the initially fully connected network. The nodes that are 
marked here with their corresponding symbols are the nodes 
belonging to the two examples of communities that were de- 
scribed in Fig. [T] 

3 Results 

The first finding is that when we remove the weaker links, 
the nodes are clustered according to the sector of economic 
activity, and this is in general expected, since the intra- 



sector correlations between stocks are usually very strong. 
Typical mean values of weights of links classified as intra- 
iMM sector, inter-sector, intra-sub-sector, etc. and their esti- 
mated standard deviation were computed by Tumminello 
et al. [2^ using a bootstrap technique. However, with 
our approach we can identify also, in the same commu- 
nity, nodes from different sectors. As an example, in Fig- 
ure[T]^d) we find a very well-connected community of the 
stocks AIG(tan), AXP(tan), BMY(red), C(tan), CL(red), 
DlS(gray), EMR(green), FNM(tan), GE (green), JPM(tan), 
KO(red), MER(tan), MMC(tan), SGP (red), TA(tan) and 
TY(tan). As can be seen from the color code, these stocks 
belong to 4 different sectors. The meaning of this result is 
probably related to other activities of the companies that 
are not reflected by their main sector classification, but 
they affect the performance of the stocks in a non trivial 
way. 

In Fig [2] we present a variety of further communities 
of stocks belonging to one or more sectors of economic 
activity. Most of these communities are almost fully con- 
nected subgraphs of the network. From Table [U where the 
tick names of each community's stocks are listed, we can 
identify several stocks that belong to more than one com- 
munity. This finding of overlapping communities is novel 
and very important since it points that there is a num- 
ber of stocks that can influence many other stocks or even 
group of stocks, belonging to different sectors, and vice 
versa. Examples of such stocks that we identify from Ta- 
ble[T]are: General Electric (GE), Coca Cola (KG), Exxon 
Mobil (XON) and Procter & Gamble (PC). All the above 
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examples refer to very large capitalization companies that parable to the size of the original network. This behavior 

point towards the significance of such blue chips to the is completely different from the way the network disin- 

overall market activity. tegrates when we remove first its stronger or its weaker 

. . , , , 1 , ■ 1 _lt 1 i ■ links. For both these cases, as we remove the links of the 

Another widely used method to identity clusters m a 

ir • network some isolated nodes, or even some small clusters 

network of n oi stocks according to the sector of economic 

■ .u Tilt- ■ c • rn /i\/rcrp\ . 1 • of nodes, gradually losing all their links and they are being 
activity IS the Minimum Spanning iree (MS 1 ) technique ^ a j o jo 

i i >T i r»i i r»r i r»/^ i A TV j^orn i • r 1 ■ removcd from the network, therefore, the network is being 

[17121125126] . An MST analysis was performed using a 7 ; t. 

, ^ . ., ^ ^1 1 n i 1 ini^i stripped of its nodes and it becomes disconnected without 

dataset similar to the one we use by Bonanno et al. [27J. 

mi . , , . _li • • i i i- i • i & sharp percolation transition. The desintegration of the 

ihis technique niters the original correlation matrix and ° 

, , , r 1 1- 1 i_ r • • 1 / network is faster when we remove the weaker correlated 

keeps only a tree ot n — 1 links out of the original n[n — 

,N ,„ ,. 1 -ii i i 1 • • 1 1- i A T i- r ii • links. This result shows that the weak links are responsi- 
1 j/2 links with total minimal distance. Application of this 

^1 , ^ ^, • • 1 r 11 i 1 i 1 i ii blc for the global connectivity of the network, while most 

method to the original fully connected network or to the ^ ' 

, ■ T^- m \ ■ ■ li - Til of the strong links form local structures, 

network given m h iglUc), yields a nice clustering ol stocks ° 

according to their economic activity, but the communities For the analysis that follows we calculated further prop- 
we identified in Fig [1] are fragmented and, as shown in erties of the remaining connected component of the net- 
Fig [21 this fragmentation is more pronounced for stocks work as a function of the fraction of removed links q by ap- 
belonging to different economic sectors. plying all three different removal procedures we described. 

Next we focus on other properties of the network. In A property that plays important role in the structure 
Fig [H^a) we compare the largest cluster of the network and connectivity of many different kind of networks is the 
versus the fraction of removed links q using weak removal, formation of cliques. This property can be easily under- 
strong removal and random removal. For random removal stood for the case of social networks where it represents 
it is seen that the network remains connected until we circles of friends or acquaintances in which every member 
have removed over 99.9% of the original links. Indeed, af- knows well every other member of the clique, but usually 
ter we remove 99.9% of the original links the value of the does not know members of other cliques. One method to 
parameter k [S] becomes n — Ik"^^ / {k) — 2, where k is quantify the tendency to cluster in this way is the clus- 
the degree of the nodes and (. . .) is the average over all the tering coefficient C{q) '3'4' , that is defined as follows. If a 
nodes. At this point there is a percolation transition and vertex i has ki neighbors then at most ki(ki — l)/2 edges 
the network breaks into clusters of connected components, can exist between them (this occurs when every neighbor 
As we can see from Fig. [HJb) the second largest cluster of i is connected to every other neighbour of i). Let Ci{q) 
has a maximum size for q = 0.999 and its size is com- denote the fraction of such existing edges for node i, then 
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C{q) is defined as the average of Ci{q) over all connected us to draw some interesting conclusions. Results of this 

nodes of the network. analysis are shown in Figure [U^d). 

We used the clustering coefficient C{q) to compare From FiguresHJc) and^d) we can see that the remain- 
the connectivity of the network structures that survive i^g network after removing the stronger links although 
after removing a fraction q of the original links, for the it has more nodes compared to the network obtained af- 
three different cases of the link removal procedure men- ter removing the same amount of the weaker links, has 
tioned above. The results of this analysis are shown in a much lower internal structure (it has fewer and smaller 
Figure SJc). Note that the network for which its strongest chques). This means that inside the original network there 
links survive, it has always higher clustering coefficient, exist a well defined underlying structure of strongly con- 
again showing that strong links make more connections nected components, and a bulk of weaker, less meaningful 
locally, while weak links are responsible for the global net- connections. The more weaker links we remove from the 
work structure network, the more visible this structure becomes. This is 

In addition to the clustering coefficient, there exist an- ^"^^^^^ behind the sharp peaks in Figures He) and 

r 1 J- • ij • f -t- Hid). This increase of C(q), N'-,(q) and NrnaxiQ) with q 

other useful quantity that can yield more information on ^ ' cl^^^ inctAVi/ h 

, , , , n ; 1 ; ■[ 1 1 r that starts to occur around q = 0.998 only when remov- 

the structure of the network, ihis is the total number of ^ 

1- J j-T, • ■ ^nT 1 1 4^ +1, 14- h f ing weak links suggests that at this regime we are able to 

cliques and their size. We calculate the relative number oj 

,■ i\Tr r \ J c J -1-1. 4-4-1 u c ^■ -tu-t uncover some of the most important structural features of 
chques i\^-^[q), dehned as the total number of cliques that ^ 

. , ■ ; 1 ; 1 J- -J J --tu -tu u -(••-!- J the network. This iustify our analysis of the communities 
exist m the network divided with the number of its nodes, j j j 

in Figs [T] and [5] at values of q = 0.995. We find similar 
= ^cl('?)/^nodes(9)- results for 0.995 <q< 0.999. 

Next, we study the dynamic evolution of the networks 

We also calculate the relative number of maximum clique 

by comparing links using annual data. We approach this 

size Nmax.{q), defined as the ratio between the maximum 

by analysing the network in a similar fashion to the anal- 
clique size Max^2(g) in the network and the number of 

ysis of the dynamics of Minimum Spanning Trees |25l26j . 

nodes, 

The single-step similarity probability is a measure of how 

A^max(g) = MaXj,i((j)/A^^Qjgg(q). 

many common links exist in the networks for two con- 
secutive years, after we removed the same percentage of 

Finding whether there is a clique of a given size in a graph 

removed links q. The single step similarity probability is 

is a NP-complete problem. We thus studied the behavior 

defined as: 

of the above quantities only for the range close to q = 0.99, 

where the network is very sparse, but it is enough to help '^"'(i) — \E{t)\ ^^^^^ ^ ^ 1)1 ' i^) 
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Fig. 4. (a) The number of nodes belonging to the largest cluster of the network versus the fraction of removed links q. Inset: A 
zoom of the area around q — 0.99. (b) The number of nodes belonging to the second largest cluster of the network versus the 
fraction of removed links q. Inset: A zoom of the area around q — 0.99. (c) Clustering coefficient C(g) for the three different 
cases of the link removal procedure. Inset: A zoom of the area around q = 0.99. (d) The relative number of cliques 7V^j(g), 
after we removed a fraction q of links versus the fraction of removed links q. Inset: The relative number of maximum clique size 
A'max(g) after removing a fraction q of links versus the fraction of removed links q. 



where E{t) is the set of edges of the network at time t, 'n' 
is the intersection operator and the operator ' | . . . | ' gives 
the number of elements in the set. 

Accordingly, the multi-step similarity probability at time 
t, after we remove a fraction q of the initial network, is de- 
fined as: 



1 



\E{t)nE{t - 1) . 



\m\ 

...r\E{t-T + i)nE{t-T)\, (4) 



where only the edges that are continuously present on the 
network after t time steps are counted. Plots of the above 



quantities are shown at Figure O From these plots we can 
see that when we remove the weaker correlated links we 
are left with a small, stable, and very strongly connected 
network (Figs El^b) and (d)), while when we remove the 
stronger correlated links we are left with a larger network 
that is not stable over time (Figs[5fa) and (c)). 

Our results clearly suggest the presence of a nucleus 
of few strongly connected stocks in the stock market that 
form a stable structure over time. On the other hand, we 
see that the stocks that are not so strongly connected are 
those which form a much larger network (a representative 
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Fig. 5. Annual dynamics of the network similarity after we 
removed a fraction q of the initial links, (a) Single-step simi- 
larity probability, 5' (t) , of the networks after we remove the 
stronger links, (b) Single-step similarity probability, S''(t), of 
the networks after we remove the weaker links, (c) Multi-step 
similarity probability, Sr{t), of the networks after we remove 
the stronger links, (d) Multi-step similarity probability, S!j-{t), 
of the networks after we remove the weaker links. 

network of the whole market). The presence of these noisy 
connections in the network makes it almost impossible to 
predict the price movement of one stock only by using 
information about the price movement of another. 



mostly due to the connectivity provided by its weak links. 
This behavior is contrary to what happens in scale free 
networks, which show a large tolerance to random fail- 
ures or attacks, but are highly vulnerable to intentional 
attacks |9I28I29I30| . 

Removal of the weak links from a correlation based 
network results in a somewhat shrinking of the network, 
but the properties of the remaining part are similar to the 
properties of the original network. This finding could ex- 
plain why a financial market is not affected strongly by the 
small capitalization stocks that are usually weakly con- 
nected. On the contrary, it is strongly affected by the high 
capitalization stocks that are strongly connected most of 
the time. This behavior could not be explained by a scale 
free topology, because in that case a targeted attack to the 
strongly correlated nodes would result in a breakdown of 
the connections of the network. 



Summarising, our results suggest that there is strong 
correlation between the topology of the network and the 
weights of the links, which in our case is the correlation 
strength between the different stocks. The network disin- 
tegrates without a sharp percolation transition when we 
We found that correlation based networks show great tol- sequentially remove either its weaker or its stronger corre- 



4 Discussion 



erance to the removal of the stronger links since the net- lated links. Since the network lacks a natural cutoff there 

work remains connected until we remove almost 99.9% of is always an arbitrariness in the method one uses to filter 

its original links. If the removal is targeted to the weaker out links of the original network, which might result in los- 

links, it results to a faster removal of nodes, but leaving ing valuable information. However, since close to removing 

the remaining network highly connected and highly clus- 99.5% of the weak links we identify clear structures, we 

tered. This shows that the tolerance of the network to used this threshold as our parameter which indeed show 



random and to intentional attacks on strong links comes the network's meaningful structure. 
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Table 1. Tick symbols of the stocks belonging to the communities shown in Fig[2l The symbols f , ff , J and mark stocks that 
appear in more than one community. 
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We also find similar results by performing the same 
analysis using closing prices of a different market, the 
Athens Stock Exchange (ASE) for the period 1987 to 2004 
(These results will be published elsewhere). This signifies 
that our findings are general and do not depend on the 
particular investigated system. However, we must keep in 
mind that correlation based networks are different from 
ordinary networks due to the fact that the link associ- 
ated with each interaction is estimated starting from the 
statistical evaluation of the correlation coefficient. There- 
fore, one could use an estimation of reliability, such as 



the bootstrap technique [21], to obtain reliability values 
for all links of the communities detected with the k-clique 
method, but such an analysis is beyond the scope of this 
paper. 
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